-
-
Notifications
You must be signed in to change notification settings - Fork 16
feat!: Add working conversion webhook with cert rotation #1066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pub const WEBHOOK_CA_LIFETIME: Duration = Duration::from_minutes_unchecked(3); | ||
pub const WEBHOOK_CERTIFICATE_LIFETIME: Duration = Duration::from_minutes_unchecked(2); | ||
pub const WEBHOOK_CERTIFICATE_ROTATION_INTERVAL: Duration = Duration::from_minutes_unchecked(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder to bump these before merging. Currently they are so low for easy testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, I didn't look at the CertificateResolver
yet.
crates/stackable-webhook/src/lib.rs
Outdated
/// A generic webhook handler receiving a request and state and sending back | ||
/// a response. | ||
/// | ||
/// This trait is not intended to be implemented by external crates and this | ||
/// library provides various ready-to-use implementations for it. One such an | ||
/// implementation is part of the [`ConversionWebhookServer`][1]. | ||
/// | ||
/// [1]: crate::servers::ConversionWebhookServer | ||
pub(crate) trait StatefulWebhookHandler<Req, Res, S> { | ||
fn call(self, req: Req, state: S) -> Res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It imposed maintenance effort to me while I figured out how the code should look like.
I'm also pretty confident we are never going to need it for a conversion web-hook. Therefore I removed it to have less maintenance effort going forward.
Also, stateful conversion webhooks sounds like a potential nightmare once we want to have HA and multiple instances.
We can obviously always re-add it in the case we really need it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's leave it as is for now. We can always re-add if needed.
This then needs a changelog entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned in 0903686
options: Options, | ||
client: Client, | ||
field_manager: impl Into<String> + Debug, | ||
operator_environment: OperatorEnvironmentOpts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: A good place for this might be the Options
for ConversionWebhookServer
.
.context(ConvertCaToPemSnafu)?; | ||
|
||
let crd_api: Api<CustomResourceDefinition> = Api::all(client.clone()); | ||
for mut crd in crds.iter().cloned() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I feel like we shouldn't clone here, but instead mutate the CRDs (so that we also have an up2date version of the CRD in memory).
I also feel like we should be a little more clever on when to run all of the following code. We can skip running all of the below code for like 99% percent of the time, as the certificate is still valid and nothing needs to be adjusted in the conversion
section of the CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't really work out in terms of borrow checker (see below). Also I prefer the signature this way, as the only thing that we should change about the CRD is the conversion
(which changes every reconcile_crds
call).
Having multiple functions take owned or &mut CRDs makes this a bit less clear and some function might (accidentally) change things.
This clone is called every 7 days or so (cert lifetime), I'd say the performance is negligible here.
we should be a little more clever
Currently reconcile_crds
is called for every new cert, so every 7 days or so and every time the cert changes.
Even if there would be no change, k8s would not do anything, as the object is unchanged.
diff --git a/crates/stackable-webhook/src/servers/conversion.rs b/crates/stackable-webhook/src/servers/conversion.rs
index 03fc3b2..0d265b1 100644
--- a/crates/stackable-webhook/src/servers/conversion.rs
+++ b/crates/stackable-webhook/src/servers/conversion.rs
@@ -189,7 +189,7 @@ impl ConversionWebhookServer {
Self::reconcile_crds(
&client,
&field_manager,
- &crds,
+ crds.clone(),
&operator_environment,
¤t_cert,
)
@@ -240,14 +240,14 @@ impl ConversionWebhookServer {
mut cert_rx: mpsc::Receiver<Certificate>,
client: &Client,
field_manager: &str,
- crds: &[CustomResourceDefinition],
+ mut crds: Vec<CustomResourceDefinition>,
operator_environment: &OperatorEnvironmentOptions,
) -> Result<(), ConversionWebhookError> {
while let Some(current_cert) = cert_rx.recv().await {
Self::reconcile_crds(
client,
field_manager,
- crds,
+ &mut crds,
operator_environment,
¤t_cert,
)
@@ -261,7 +261,7 @@ impl ConversionWebhookServer {
async fn reconcile_crds(
client: &Client,
field_manager: &str,
- crds: &[CustomResourceDefinition],
+ crds: &mut [CustomResourceDefinition],
operator_environment: &OperatorEnvironmentOptions,
current_cert: &Certificate,
) -> Result<(), ConversionWebhookError> {
Co-authored-by: Techassi <[email protected]>
Co-authored-by: Techassi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a bunch of comments.
I encountered the ConversionWebhookServer which is missing the changes we talked about a few weeks back. As such, it doesn't make a whole lot of sense to continue the review before these changes are implemented. I've sent you an appropriate patch as a private message which should be a good starting point for the changes we discussed.
"--operator-namespace", | ||
"stackable-operators", | ||
"--operator-service-name", | ||
"foo-operator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I still feel like all these CLI unit tests are pretty much useless and should be removed to speed up the testing runs.
@@ -34,7 +34,12 @@ All notable changes to this project will be documented in this file. | |||
### Changed | |||
|
|||
- Update `kube` to `1.1.0` ([#1049]). | |||
- BREAKING: Return type for `ListenerOperatorVolumeSourceBuilder::new()` is no onger a `Result` ([#1058]). | |||
- BREAKING: Return type for `ListenerOperatorVolumeSourceBuilder::new()` is no longer a `Result` ([#1058]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This is already released and as such needs to be removed here.
- BREAKING: Require two new CLI arguments: `--operator-namespace` and `-operator-service-name`. | ||
These are required, so that the operator knows what Service it needs to enter as CRD conversion webhook ([#1066]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This is an addition and as such needs to be moved to the "Added" section. Also, slight reword:
- BREAKING: Require two new CLI arguments: `--operator-namespace` and `-operator-service-name`. | |
These are required, so that the operator knows what Service it needs to enter as CRD conversion webhook ([#1066]). | |
- BREAKING: Add two new required CLI arguments: `--operator-namespace` and `-operator-service-name`. | |
These two values are used to construct the service name in the CRD conversion webhook ([#1066]). |
- BREAKING: The `ProductOperatorRun` used for CLI arguments has some field renamed for consistency ([#1066]): | ||
- `telemetry_arguments` -> `telemetry` | ||
- `cluster_info_opts` -> `cluster_info` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Slight reword.
- BREAKING: The `ProductOperatorRun` used for CLI arguments has some field renamed for consistency ([#1066]): | |
- `telemetry_arguments` -> `telemetry` | |
- `cluster_info_opts` -> `cluster_info` | |
- BREAKING: Rename two fields of the `ProductOperatorRun` struct for consistency and clarity ([#1066]): | |
- `telemetry_arguments` -> `telemetry` | |
- `cluster_info_opts` -> `cluster_info` |
use kube::{ | ||
Api, Client, ResourceExt, | ||
api::{Patch, PatchParams}, | ||
}; | ||
use snafu::{OptionExt, ResultExt, Snafu}; | ||
use stackable_operator::cli::OperatorEnvironmentOptions; | ||
use tokio::{sync::mpsc, try_join}; | ||
use tracing::instrument; | ||
use x509_cert::{ | ||
Certificate, | ||
der::{EncodePem, pem::LineEnding}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: Please combine these imports with the others above.
ReconcileCRDs { | ||
#[snafu(source(from(ConversionWebhookError, Box::new)))] | ||
source: Box<ConversionWebhookError>, | ||
}, | ||
|
||
#[snafu(display("failed to update CRD {crd_name:?}"))] | ||
UpdateCRD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I think we should stick with PascalCase here.
ReconcileCRDs { | |
#[snafu(source(from(ConversionWebhookError, Box::new)))] | |
source: Box<ConversionWebhookError>, | |
}, | |
#[snafu(display("failed to update CRD {crd_name:?}"))] | |
UpdateCRD { | |
ReconcileCrd { | |
#[snafu(source(from(ConversionWebhookError, Box::new)))] | |
source: Box<ConversionWebhookError>, | |
}, | |
#[snafu(display("failed to update CRD {crd_name:?}"))] | |
UpdateCrd { |
Description
Part of stackabletech/issues#642
An working example usage can be found in stackabletech/zookeeper-operator#958 (mainly look at
rust/operator-binary/src/main.rs
)Definition of Done Checklist
Author
Reviewer
Acceptance